16:14
2026-07-03
the-decoder.com
ai-safety
UK's AI Security Institute finds standard benchmarks systematically underestimate what AI agents can actually do
The UK's AI Security Institute (AISI) found that standard AI benchmarks systematically underestimate agent capabilities by limiting compute budgets. In a study of seven benchmarks, increasing the tokeβ¦